Interactive Notebook: SEG 2014 Chevron Full Waveform Inversion Synthetic dataset

IPython Notebook updated by Brendan Smithyman on April 25, 2014

The original dataset is available from Chevron here.

This IPython notebook is licensed under Creative Commons Attribution-ShareAlike 2.5 Canada. The SEG 2014 Chevron Full Waveform Inversion Synthetic dataset is licensed separately; see the license agreement with the data and/or the reproduction at the bottom of this worksheet. The SEG-Y library used is from pygeo, and is distributed under the GNU Lesser GPL.

Data Location


In [1]:
# Set the data directory here to point to your copy of the Chevron dataset
DATA_DIRECTORY = 'data'

Readme File


In [2]:
with open(DATA_DIRECTORY + '/README', 'r') as fp:
    print(fp.read())


SEG2014 Workshop 2D MARINE ISOTROPIC ELASTIC SYNTHETIC SEISMIC BENCHMARK
April 2014

Package contains:
	This Readme file
	One SEGY seismic data file:             SEG14.Pisoelastic.segy: 4,234,122,000 bytes (dX = 25 m)
	One SEGY smooth starting velocity file: SEG14.Vpsmoothstarting.segy (dX = 12.5 m)
	One Vp well log 1000 < Depth < 2450 m:  SEG14.Vplog (dZ = 5 m)
	One farfield wholespace (no ghost) wavelet file:  Wavelet.txt (0.666666 ms sampling)
        One picture of the wavelet and the well log:      WellWavePic.JPG
	Legal Disclaimer (ascii)

General Information:
	1600 shots:               dS = 25 m, Source depth = 15 m
	321 hydrophone recs/shot: dR = 25 m, Recevr depth = 15 m
	Max offset = 8000 m
	Record time = 8.0 s, sample rate 4 ms
	Vp water = constant = 1510 m/s
	With free surface multiples present in the data
	Isotropic Elastic

Questions to:  Joe Stefani    joestefani@chevron.com

General Setup


In [3]:
%pylab
%matplotlib inline
import matplotlib
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('png')
matplotlib.rcParams['savefig.dpi'] = 300 # Change this to adjust figure size


Using matplotlib backend: MacOSX
Populating the interactive namespace from numpy and matplotlib

In [4]:
# Plotting options
font = {
    'family': 'Bitstream Vera Sans',
    'weight': 'normal',
    'size': 12,
}

matplotlib.rc('font', **font)

In [5]:
# import pygeo SEG-Y library
from segyread import SEGYFile

In [6]:
import scipy.ndimage as ndimage

In [7]:
# Whether to display diagnostic output when reading the SEG-Y file
SEGY_VERBOSE=True

In [8]:
# Functions to convert to local unit in km
lu = 'km' # label for figure axes

if lu == 'km':
    m2lu = lambda x: x * 1e-3
    km2lu = lambda x: m2lu(x) * 1e3
    xlabelspacing = 1
else:
    lu = 'm'
    m2lu = lambda x: x
    km2lu = lambda x: m2lu(x) * 1e3
    xlabelspacing = 2

# Function to make array
arr = lambda x: np.array(x)

In [9]:
# Grid size
DX, DZ = [12.5, 5.0] # m
DX_SRC = m2lu(25) # m
DX_REC = m2lu(25) # m

# Plotting extents
MODEL_EXTENT = km2lu(arr((0, 47.7375, 5.995, 0))) # x0, x1, z1, z0; all in km
WELL_EXTENT = (1500, 4500, m2lu(1000), m2lu(2450)) # v0, v1 in m/s; z0, z1 in km
WAVELET_EXTENT = (0, 0.250, -1.1, 1.1) # t0, t1 in ms; relative amps
WAVELET_SAMPR = 2. / 3e3 # sample rate in seconds
SPECTRUM_EXTENT = (0, 50, 0, 1.1) # f0, f1, p0, p1

# Array depths
SOURCE_DEPTH = m2lu(15) # m
REC_DEPTH = m2lu(15) # m

# Well position
WELL_X = m2lu(39375) # m
WELL_INDEX = 3150

# Velocity range
VMIN = 1500 # m/s
VMAX = 4500 # m/s

# Array parameters
SOURCES = 1600
CHANNELS = 321

In [10]:
# From: http://stackoverflow.com/a/312464
def chunks(l, n):
    for i in xrange(0, len(l), n):
        yield l[i:i+n]

Interactive Data Overview

What follows is an interactive overview of the model, data and accompanying materials provided by Chevron and the SEG for the SEG 2014 Chevron Full Waveform Inversion Synthetic dataset.

Well Sonic Log


In [11]:
with open(DATA_DIRECTORY + '/SEG14.Vplog') as fp:
    lines = fp.readlines()
wellLog = np.array([[float(item) for item in line.strip().split()[2::3]] for line in lines[1:]])
wellLog[:,0] = m2lu(wellLog[:,0])
smoothLog = ndimage.median_filter(wellLog[:,1], 10) # Short-period median filter
smootherLog = ndimage.median_filter(wellLog[:,1], 100) # Long-period median filter
gaussianLog = ndimage.gaussian_filter(wellLog[:,1], 10) # Gaussian filter
wellPos = np.array([[WELL_X, wellLog[0,0]], [WELL_X, wellLog[-1,0]]])

In [12]:
aspect = 2e3

fig = plt.figure()
ax = fig.add_subplot(1,1,1, aspect=aspect)
ax.xaxis.set_label_text('Velocity [m/s]')
ax.yaxis.set_label_text('Depth [%s]'%lu)
ax.set_title('Well Sonic Log')
plt.xticks(np.linspace(2e3,4e3,5))
ax.axis(WELL_EXTENT)
ax.invert_yaxis()
# pt = ax.plot(smootherLog, wellLog[:,0], 'b-', linewidth=2, label='100-point median filter')
pt = ax.plot(smoothLog, wellLog[:,0], 'b-', label='10-point median filter')
# pt = ax.plot(gaussianLog, wellLog[:,0], 'r-', label='Gaussian filter')
pt = ax.plot(wellLog[:,1], wellLog[:,0], 'k-', label='Sonic log velocities', linewidth=0.5)
ax.legend(loc='upper left', bbox_to_anchor=(1.1, 1.0), fancybox=True, shadow=True, ncol=1)


Out[12]:
<matplotlib.legend.Legend at 0x10c149b10>

Velocity Model


In [13]:
# Load the velocity model
sfVelocity = SEGYFile(DATA_DIRECTORY + '/SEG14.Vpsmoothstarting.segy', endian='Big', verbose=SEGY_VERBOSE)

# Get an array containing the model
vel0 = sfVelocity[:]

# Print out useful information
print('Model size: %d×%d'%(sfVelocity.ntr, sfVelocity.ns))
print('dz from headers: %4.3f %s'%(m2lu(sfVelocity.bhead['hdt'] * 1e-3),lu))


Detecting machine endianness...
Little.

Trying to create memory map...
Success. Using memory-mapped I/O.

Reading SEG-Y headers...
Read SEG-Y headers.
	3820 traces present.

Big endian specified... Not autodetecting.
Big endian != Little endian, therefore Foreign.
FORMAT == 1
             ...converting from IBM floating point.

Model size: 3820×1200
dz from headers: 0.005 km

In [14]:
# Example for grabbing geometry from headers
scalco = sfVelocity.trhead[0]['scalco']
if scalco < 0:
    scalco = -1. / scalco
xVec = m2lu(np.array([trh['sx'] * scalco for trh in sfVelocity.trhead]))

In [15]:
# Set aspect ratio (vertical exaggeration)
aspect = 3

fig = plt.figure()
ax = fig.add_subplot(1,1,1, aspect=aspect)
ax.xaxis.set_label_text('X [%s]'%lu)
ax.yaxis.set_label_text('Depth [%s]'%lu)
ax.set_title('Initial Velocity Model')
plt.xticks(arange(0, MODEL_EXTENT[1], km2lu(5)*xlabelspacing))
plt.yticks(arange(0, MODEL_EXTENT[2]+km2lu(1), km2lu(1.5)))
plotopts = {
    'vmin': VMIN,
    'vmax': VMAX,
    'extent': MODEL_EXTENT,
    'aspect': aspect,
}
im = ax.imshow(vel0.T, **plotopts)

ax.plot(wellPos[:,0], wellPos[:,1], 'k-', linewidth=1)
aprops = {
    'boxstyle': 'rarrow',
    'fc': 'w',
    'ec': 'k',
    'lw': 1,
}
twell = ax.text(wellPos[1,0] - km2lu(1.5), wellPos[1,1] - km2lu(0.75), 'Exploration Well', ha='right', va='center', rotation=0, size=6, bbox=aprops)
tve = ax.text(0.02, 0.05,'VE %d:1'%aspect, ha='left', va='bottom', transform = ax.transAxes)
ax.axis(MODEL_EXTENT)

cbax = colorbar(im, orientation='horizontal', shrink=0.8)
cbax.set_ticks(arange(VMIN, VMAX+1, 500))
cbax.set_label('Velocity [m/s]')
fig.tight_layout()



In [16]:
print('Velocity range: %6.1f%6.1f m/s'%(vel0.min(), vel0.max()))


Velocity range: 1510.0–4218.9 m/s

In [17]:
aspect = 1e3

fig = plt.figure()
ax = fig.add_subplot(1,1,1, aspect=aspect)
ax.xaxis.set_label_text('Velocity [m/s]')
ax.yaxis.set_label_text('Depth [%s]'%lu)
ax.set_title('Well Sonic Log')
plt.xticks(np.linspace(2e3,4e3,3))
axmod = list(WELL_EXTENT)
axmod[2] = MODEL_EXTENT[3]
axmod[3] = MODEL_EXTENT[2]
ax.axis(axmod)
ax.invert_yaxis()
# pt = ax.plot(smootherLog, wellLog[:,0], 'b-', linewidth=2, label='100-point median filter')
# pt = ax.plot(smoothLog, wellLog[:,0], 'b-', label='10-point median filter')
# pt = ax.plot(gaussianLog, wellLog[:,0], 'r-', label='Gaussian filter')
pt = ax.plot(wellLog[:,1], wellLog[:,0], 'k-', label='Sonic log velocities', markersize=1.0, linewidth=0.5)
pt = ax.plot(sfVelocity[WELL_INDEX], np.linspace(MODEL_EXTENT[3], MODEL_EXTENT[2], sfVelocity.ns), 'm-', label='Initial model')
ax.legend(loc='upper left', bbox_to_anchor=(1.1, 1.0), fancybox=True, shadow=True, ncol=1)


FORMAT == 1
             ...converting from IBM floating point.

Out[17]:
<matplotlib.legend.Legend at 0x10ce6d650>

Source Wavelet


In [18]:
with open(DATA_DIRECTORY + '/Wavelet.txt') as fp:
    lines = fp.readlines()

sourceWavelet = np.array([float(line) for line in lines[1:]])
sourceWaveletTime = np.linspace(0, WAVELET_SAMPR*(len(sourceWavelet)-1), len(sourceWavelet))
sourceWavelet = np.array([sourceWaveletTime, sourceWavelet]).T

sourceSpectrum = np.abs(np.fft.fft(sourceWavelet[:,1]))
freqs = np.fft.fftfreq(len(sourceWavelet), WAVELET_SAMPR)
idx = np.argsort(freqs)
idx = idx[len(idx) / 2:]

In [19]:
fig = plt.figure()

ax1 = fig.add_subplot(2,1,1, aspect=0.3e-1)
ax1.xaxis.set_label_text('Time (s)')
ax1.yaxis.set_label_text('Amplitude')
ax1.set_title('Source wavelet')
ax1.axis(WAVELET_EXTENT)
pt = ax1.plot(sourceWavelet[:,0], sourceWavelet[:,1] / abs(sourceWavelet[:,1]).max(), 'g-')

ax2 = fig.add_subplot(2,1,2)
ax2.xaxis.set_label_text('Frequency (Hz)')
ax2.yaxis.set_label_text('Norm. amp.')
ax2.set_title('Amplitude spectrum')
ax2.axis(SPECTRUM_EXTENT)
fi = ax2.fill(freqs[idx], sourceSpectrum[idx] / abs(sourceSpectrum).max(), 'c')
pt = ax2.plot(freqs[idx], sourceSpectrum[idx]  / abs(sourceSpectrum).max(), 'b')

fig.tight_layout()


SEG-Y Data Traces


In [20]:
# Load the seismic data traces
sfPiso = SEGYFile(DATA_DIRECTORY + '/SEG14.Pisoelastic.segy', endian='Big', verbose=SEGY_VERBOSE)

dt = sfPiso.trhead[0]['dt'] / 1e3 # ms

# Print out useful information
print('\nNumber of traces: %d\nNumber of samples per trace: %d\nSample rate: 1/[%d ms]'%(sfPiso.ntr, sfPiso.ns, dt))


Detecting machine endianness...
Little.

Trying to create memory map...
Success. Using memory-mapped I/O.

Reading SEG-Y headers...
Read SEG-Y headers.
	513600 traces present.

Big endian specified... Not autodetecting.
Big endian != Little endian, therefore Foreign.

Number of traces: 513600
Number of samples per trace: 2001
Sample rate: 1/[4 ms]

In [21]:
# Example for grabbing geometry from headers
scalco = sfPiso.trhead[0]['scalco']
if scalco < 0:
    scalco = -1. / scalco

# NB: Strictly, these can be different. Assume they match.
# scalel = sfPiso.trhead[0]['scalel']
# if scalel < 0:
#     scalel = -1. / scalel

In [22]:
# geomSrc = sf.trhead[::CHANNELS]
xSrc = m2lu(arr([float(trh['sx'])*scalco for trh in sfPiso.trhead[::CHANNELS]]))

In [23]:
xRec = {i: [m2lu(float(trh['gx'])*scalco) for trh in trhs] for i, trhs in enumerate(chunks(sfPiso.trhead, CHANNELS))}

In [24]:
xRecStations = set()

In [25]:
for recs in xRec.values():
    xRecStations.update(set(recs))
xRecStations = list(xRecStations)
xRecStations.sort()
receivers = len(xRecStations)
print('There are %d sources, each of which is recoded on %d channels.\nThere are %d unique receiver locations.'%(SOURCES, CHANNELS, receivers))


There are 1600 sources, each of which is recoded on 321 channels.
There are 1920 unique receiver locations.

In [26]:
shot = 800
sgather = sfPiso[shot*CHANNELS : (shot+1)*CHANNELS]
sgatherN = sfPiso.sNormalize(sgather) # Trace normalization (ensemble scale)
offset = xRec[shot] - xSrc[shot]

# Set aspect ratio (vertical exaggeration)
aspect = 1e-3
colormap = cm.bwr

fig = plt.figure()

ax = fig.add_subplot(1,1,1, aspect=aspect)
ax.xaxis.set_label_text('Offset [%s]'%lu)
ax.yaxis.set_label_text('Time [ms]')
ax.set_title('Shot #%d'%shot)
plotopts = {
    'vmin': -0.2,
    'vmax': 0.2,
    'aspect': aspect,
    'cmap': colormap,
    'extent': (offset[0], offset[-1], dt * (sfPiso.ns - 1), 0)
}
ax.imshow(sgatherN.T, **plotopts)


FORMAT == 1
             ...converting from IBM floating point.

Normalizing each trace to unit amplitude.

Out[26]:
<matplotlib.image.AxesImage at 0x10f6661d0>

In [27]:
kx = np.fft.fftshift(np.fft.fftfreq(CHANNELS, DX_REC))
freqs = np.fft.fftshift(np.fft.fftfreq(sfPiso.ns, dt * 1e-3))
freqcut = len(freqs) / 2
freqs = freqs[freqcut:]
sgatherF = np.fft.fftshift(np.fft.fft2(sgather.T))
sgatherF = sgatherF[freqcut:,:]

# Set aspect ratio (vertical exaggeration)
aspect = 5e-1

fmax = SPECTRUM_EXTENT[1] # Hz
idx = freqs <= fmax
freqs = freqs[idx]

fig = plt.figure()

ax = fig.add_subplot(1,1,1, aspect=aspect)
ax.xaxis.set_label_text('Horizontal wavenumber [1 / %s]'%lu)
ax.yaxis.set_label_text('Frequency [Hz]')
ax.set_title('Shot #%d'%shot)
plotopts = {
    'aspect': aspect,
    'cmap': cm.jet,
    'extent': (kx[0], kx[-1], freqs[-1], freqs[0]),
    'vmin': 0,
    'vmax': 1e-2,
}
im = ax.imshow(abs(fliplr(sgatherF[idx,:])), **plotopts)



In [28]:
traceoffset = 0
cogather = sfPiso[traceoffset::CHANNELS]
cogatherN = sfPiso.sNormalize(cogather) # Trace normalization (ensemble scale)
midpoint = (xSrc + arr([xRec[i][traceoffset] for i in xrange(SOURCES)])) / 2
offset = xRec[0][traceoffset] - xSrc[0]

# Set aspect ratio (vertical exaggeration)
aspect = 4e-3
colormap = cm.gray

fig = plt.figure()

ax = fig.add_subplot(1,1,1, aspect=aspect)
ax.xaxis.set_label_text('Midpoint [%s]'%lu)
ax.yaxis.set_label_text('Time [ms]')
ax.set_title('Common offset %d %s'%(offset, lu))
plotopts = {
    'vmin': -5e-7,
    'vmax': 5e-7,
    'aspect': aspect,
    'cmap': colormap,
    'extent': (midpoint[0], midpoint[-1], dt * (sfPiso.ns - 1), 0)
}
ax.imshow(cogather.T, **plotopts)


FORMAT == 1
             ...converting from IBM floating point.

Normalizing each trace to unit amplitude.

Out[28]:
<matplotlib.image.AxesImage at 0x10f6ca2d0>

In [29]:
liveChannels = np.zeros((SOURCES, receivers))

# We could go through all of the source and receiver locations and match them up with
# the corresponding source/receiver cell in this array. However, in this case the data
# are nice and synthetic, so it's pretty straightforward to generate this mask manually.
# However, for field data in general, it is necessary to figure out where each pair
# of source and receiver fits into this (since there are often missing traces, esp. on land).

for i in xrange(SOURCES):
    for j in xrange(i, i + CHANNELS):
        liveChannels[i,j] = 1

In [30]:
fig = plt.figure()

# Subplot showing axes in terms of source / receiver number
# ... also plot selected source and common offset
ax = fig.add_subplot(1,2,1)
ax.xaxis.set_label_text('Receiver Station')
ax.yaxis.set_label_text('Source Station')
ax.set_title('Live: By Station')
ext = (0, receivers, SOURCES, 0)
im = ax.imshow(liveChannels, cmap=cm.gray_r, extent=ext)
ax.plot([shot, shot+CHANNELS], [shot, shot], 'r-')
ax.plot([traceoffset, traceoffset+SOURCES], [0, SOURCES], 'y--')
ax.axis(ext)

# Subplot showing axes in terms of X coordinate
# ... also plot selected source and common offset
ax = fig.add_subplot(1,2,2, aspect=50)
ax.xaxis.set_label_text('Receiver X [%s]'%lu)
ax.yaxis.set_label_text('Source X [%s]'%lu)
ax.set_title('Live: By Position')
plt.xticks(arange(0, MODEL_EXTENT[1], km2lu(8)*xlabelspacing))
ext = np.array([xRecStations[0], xRecStations[-1], xSrc[-1], xSrc[1]])
im = ax.imshow(liveChannels, cmap=cm.gray_r, extent=ext)
ax.plot([xRec[shot][0], xRec[shot][-1]], [xSrc[shot], xSrc[shot]], 'r-')
ax.plot([xRec[0][traceoffset], xRec[SOURCES-1][traceoffset]], [xSrc[0], xSrc[-1]], 'y--')
ax.axis(ext)

fig.tight_layout()


Chevron Data License Agreement


In [31]:
with open(DATA_DIRECTORY + '/DataLicenseAgreement', 'r') as fp:
    print(fp.read())


DATA LICENSE AGREEMENT

Do not separate this Data License Agreement from this Data.
Make sure you read and understand this document fully.

Subject to the terms and conditions below, Chevron hereby extends to you ("User")
a limited, non-exclusive, non-transferable, revocable, royalty-free license
("License") to use Chevron's 2D Synthetic Seismic Dataset, including all data and
supporting information related to the 2D Synthetic Seismic Dataset, such as
without limitation, Vp well log, starting Vp model, seismic wavelet,
synthetic gravity data and electromagnetic data (collectively, "Data")

1.	SEG. The SEG is helping to distribute this Data,
but is not legally liable in any way because of that assistance.

2.	NO WARRANTIES.  THE DATA IS PROVIDED "AS IS," WITH NO WARRANTY OF ANY KIND.
CHEVRON MAKES NO REPRESENTATIONS OR WARRANTIES TO USER, EXPRESS, IMPLIED OR
STATUTORY, INCLUDING ANY IMPLIED WARRANTIES OF FITNESS OR FOR A PARTICULAR PURPOSE.  

3.	DAMAGE WAIVER. CHEVRON SHALL NOT BE LIABLE FOR ANY DAMAGES OF ANY KIND,
INCLUDING PERSONAL INJURY, PROPERTY DAMAGE AND CONSEQUENTIAL DAMAGES, WHICH MAY
ARISE IN CONNECTION WITH THIS LICENSE OR USER'S USE OF THE DATA OR RESULTS.

4.	MODIFICATIONS.  If User modifies this Data in any way, User must document
that the data has been modified from its original form and explain how it differs
from the original Data.

5.	REDISTRIBUTION.  If user agrees to disclose any or all of this Data with
any other person or organization, user will also provide all of the associated
documentation originally included with the Data and provided to user, including
this Data License Agreement.  If User discloses any modified version of Data,
the modification must be documented and included along with the Data.

6.	PUBLICATIONS.  User agrees to acknowledge Chevron for use of the Data
in any proposed publications, presentations or other materials that include the Data. 

7.	VOLUNTARY USE.  Use of the Data is voluntary.
Chevron will not compensate User in any way for processing, analyzing,
modifying, distributing or otherwise using the Data. 

By using the Data, User hereby accepts the above terms and conditions of this License.
If User cannot agree with the above terms and conditions, User cannot use this Data.